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Abstract 

We are concerned with the detection of associations between random vectors 
of any dimension. Few tests of independence exist that are consistent against all 
dependent alternatives. We propose a powerful test that is applicable in all dimen- 
sions and is consistent against all alternatives. The test has a simple form and is 
easy to implement. We demonstrate its good power properties in simulations and 
on examples. 

1 Introduction 

In modern applications, there is need to test for independence between random vectors. 
One example from genomics research is whether two groups of genes are associated. 
Another application is functional magnetic resonance imaging research, where voxels 
in the brain are measured over time under various experimental conditions, and it is of 
interest to discover whether sets of voxels that comprise different areas in the brain are 
functionally related. 

Let X G W and Y G 5ft 9 be random vectors, where p and q are positive integers. 
We are interested in testing whether there is a relationship between the two vectors X 
and Y . The null hypothesis states that the two vectors are independent, 

Hq : Fxy = FxFy, 

where the joint distribution of (X, Y) is denoted by Fxy, and the distributions of X 
and Y, respectively, by Fx and Fy. We are interested in the general alternative that 



1 



the vectors are dependent, 

Hi : Fxy 7^ FxFy- 

There are N independent copies (xj, yi), i = 1, . . . , N from the joint distribution of X 
and Y for testing Ho- The dimensions of the vectors p and q may be much higher than 
N. 

The purpose of this paper is to provide a powerful test of independence that is 
applicable in all dimensions, and is consistent against all alternatives. The test is 
based on the pairwise distances between the sample values of X and of Y respec- 
tively, {dx(xi,Xj) : i,j e {1,...,N}}, {d Y {y % ,yj) : i,j £ {1,...,N}}. The only 
restriction on the distance metrics dx(-, ■) and dy (•, •) is that they are determined by 
norms. The test statistic is a function of ranks of these distances, and it can be expressed 
simply in closed form. It is proven to be consistent against all dependent alternatives. 

Few multivariate tests of independence that are consistent against all alternatives 
are available to date. iFukumizu et al.l (12008b suggest a test based on normalized cross- 
covariance operators on reproducing kernel Hilbert spaces. Bickel and Xul (120091) offer 
a test based on an approximation of Renyi correlation, since there is no explicit formula 
to compute the Renyi correlation. A very elegant test with a simple formula is provided 



Szekelv et al.l (120071) . and has been further investigated in lSzekelv and Rizzo (2009) 



and in the discussions that followed it. We revisit some of the examples o flSzekelv et al 



d2007l) . and add new examples. In the exam ples considered ou r new test performs 
remarkably well in comparison to the test of ISzekelv et al.1 (12007b . 



2 The new test of independence 

This section develops the new test of independence. To motivate the test, note that 
if X and Y are dependent and have a continuous joint density, then there exists a 
point (xq, yo) in the sample space of (X, Y), and radii R x and R y around xq and yo, 
respectively, such that the joint distribution of X and Y is different than the product of 
the marginal distributions in the cartesian product of balls around (xo,yo)- Consider 
first an oracle that guesses such a point (xo, yo) and radii R x and R y . 

Let d(-, •) be the norm distance between two sample points, either in X or in Y, 
so the distance between the vectors Xi and xj from the distribution of X is d(xi,Xj), 
and similarly the distance between the vectors yi and yj from the distribution of Y 
is d(yi, yj). Technically, this distance may be different for the samples of X and for 
the samples of Y, but we omit this distinction for simplicity of notation. Consider the 
following two dichotomous random variables: I{d(xo, X) < R x } and I{d(yo, Y) < 
R y }, where /(•) is the indicator function. We summarize the observed cross-classification 
of these two dichotomous random variables for the N independent observations k £ 
{l,...,iV} in Table [Q where A n = Y*=i I i d ( x o, %k) < Rx}I{d(y ,yk) < Ry}, 
An, A 2 i, A 2 2, defined similarly, and A m ., A. m m — 1,2, are the sum of the row or 
column, respectively. 

Evidence against independence may be quantified by Pearson's chi-square test 
statistic, or the likelihood ratio test statistic, for 2x2 contingency tables. The test 
based on such a statistic is consistent, and its power for finite sample size depends on 
the choice of (xq, yo), R x and Ry. 
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Table 1: The cross-classification of I{d(xo, X) < R x } and I{d(yo, Y) < R y } 





d(yo, •) < Ry 


d(yo,-) > R y 




d{x , •) < R x 


Ai 


A12 


A. 


d(x , •) > R x 


A 2 i 


A22 


A. 




Ai 


A.2 


N 



Table 2: The cross-classification of J{d(x,,X) < d(xj,Xj)} and 7{d(yj,y) < 
d(Vi,Vj)} 





d{y t , ■) < d(yi,yj) 


> d(yi,yj) 




d(xi, •) < d(xi,Xj) 


A n (i,j) 


Ai 2 (i,j) 


A.(i,j) 


d(xi, •) > d(xi,Xj) 


Ai(i,i) 


A 22 {i,j) 


A.(i,j) 




Ai(i,j) 


A 2 (i,i) 


iV-2 



Since we do not have an oracle that guesses well (xo, yo), R x and i? y , in the sense 
that the test for independence by a 2 x 2 contingency tables will be powerful, we let the 
data guide us in these choices. For every sample point i, we choose it in its turn to be 
(xo,yo)- For every sample point j ^ i, we choose it in its turn to define R x = d(xi,Xj) 
and R y — d(yi, yj). The 2x2 tables now comprise the remaining N — 2 points. The 
test aggregates the evidence against independence by summing over all N(N — 1) test 
statistics from the 2x2 tables thus created. 

Specifically, for fixed observations i and j, consider the dichotomous random vari- 
ables: I{d(xi,X) < d(xi, Xj)} and I{d(yi, Y) < d(yt 1 yj)}. Table |2] summarizes 
the observed cross-classification of these two dichotomous random variables for the 
N — 2 independent observations k 6 {1,. .. , N}, k ^ i,k ^ j, where An(i,j) = 
EfeLi,fc#i,fe^j I{d{Xi,x k ) < d(x i ,x j )}I(d{y l ,y k ) < d{ yi , yj )}, A 12 ,A 2 i,A 2 2 de- 
fined similarly, and A m ., A. m , m — 1,2, are the sum of the row or column, respec- 
tively. 

Let 

q( . (iV - 2){A 12 (i,j)A 21 (i,j) - A n (t, j)A 22 (t, j)} 2 
[h3> Ai.(i, j)A 2 .(i, j)A.i(i, j)A. 2 (i, j) 

This is the classic test statistic for Pearson's chi square test for 2x2 contingency tables. 

To test for independence between the two random vectors X and Y, we suggest as a 
test statistic T = S^=i For i and j with in at least one of the margins, 

we set S(i,j) = 0. The p-value from the permutation test based on the statistic T is the 
fraction of replicates of T under random permutations of the indices of the Y sample, 
that are at least as large as the observed statistic. 

We say a point (xq, yo) is a point of dependence if the joint density of X and Y 
is different than the product of the marginal densities of X and Y at (xo, yo), defined 
formally in equation (fl~|i in the Appendix for the mixed case where the coordinates 
may be both discrete and continuous. Theorem 12. 1 1 states that the test is consistent 
for discrete random vectors with countable support, as well as for continuous random 
vectors, and for random vectors where some of the coordinates are discrete and others 
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continuous, if the density of the continuous random vectors is continuous around a 
point of dependence. 

Theorem 2.1 For dependent random vectors (X, Y), X € W and Y 6 -ft 9 , denote the 
discrete and continuous coordinates of X by u C {1, . . . , p} and v — u c , respectively, 
and similarly the discrete and continuous coordinates of Y by s C {1, . . . , q] and 
t = s c , respectively. The permutation test based on the statistic T, with distances 
dx{', •) and dy •) determined by norms, is consistent if either 

1. X and Y are continuous, i.e. u and s are empty sets, and there exists a point of 
dependence [xq, yo) for which the joint density is continuous. 

2. At least one of X or Y has discrete coordinates in addition to the continuous 
coordinates, i.e. at least one of u and s is non-empty and both v and t are 
non-empty, and there exists a point of dependence (xo,yo) for which (i) there 
exists a ball around the atom {xq(u), yo(s)} that contains only this atom, and 
( ii) the joint density of the continuous coordinates conditional on the discrete 
coordinates is continuous. 

3. Both X and Y are discrete, i.e. v and t are empty sets. 

4. X is discrete and Y is continuous, i.e. v and s are empty sets, and there exists a 
point of dependence (xo,yo) for which the conditional density ofY given X is 
continuous. 

See Appendix for a proof of case 2. The proofs of the other cases are very similar yet 
simpler, and they are given in the Supplementary Material. 

2.1 Computational Complexity 

For TV sample points, the naive implementation of the test will require an order of 
magnitude of TV 3 operations. We provide an algorithm to efficiently calculate the score 
T in order of magnitude iV 2 log N. This is done by providing an algorithm which for 
a given i calculates {S(i,j) : j = 1, . . . , N, j ^ i} in order of magnitude iVlogiV. 
We shall show that we can calculate {Au(i, j), A\ 2 {i, j), A 2 i(i, j), A 2 2{i 1 j) ■ j — 
1,... , N, j ? i} in 0(N log N). 

For fixed i, let us look at all the distances from sample i according to X and let us 
sort the samples according to distance. Without loss of generality, renumber the indices 
of the N — 1 sample points other than i to be 1, . . . , N— 1, so that the jth observation is 
the jth nearest to i in X. Denote the order of the distance from i in Y by 7r(l) • ■ ■ ir(N— 
1). So the jth observation is the 7r(j)th nearest to i in Y. tt(-) is a permutation of 
1, . . . , N— 1. The entries in the above Table|2]may be expressed as a function of j, 
and inv(j), where inv(j) is defined as the number of inversions of j in the permutation 
7r, i.e. inv(j) is the number indices k G {1, ... ,j — 1} such that n(k) G {^(j) + 
1, . . . , N — 1}. From the definition of Ai 2 (i,j) it follows that Ai 2 (i,j) = inv(j), 
and similarly A 22 (i,j) = N — tr(j) — inv(j). Since Ai.(i,j) = j — 1, the remaining 
counts of the 2x2 contingency table for S(i,j) are An = j — 1 — inv(j), A 2 \ — 
ir(j) + inv(j) — j — 1. Therefore, it is enough to show that each of the following steps 
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Table 3: The power (SE x 100) for a test at level 0.05 from a sample of size N = 50 
from unusual bivariate relations. The results are based on 1000 simulations for rows 
1 — 5 and on 50000 simulations for the null setting in row 6. 



Distribution 


Dcov 


new test 


W 


0.853 (1.1) 


1.000(0.0) 


Diamond 


0.037 (0.3) 


0.662(1.5) 


Parabola 


0.975 (0.5) 


0.998 (0.1) 


2 Parabolas 


0.303 (1.4) 


1.000(0.0) 


Circle 


0.000 (0.0) 


0.993 (0.3) 


4 independent clouds 


0.050 (0.1) 


0.050 (0.1) 



takes order of magnitude NlogN: (1) renumber the indices according to increasing 
distance in X from i; (2) compute {tt(j) : j = 1, . . . , N, j ^ i}; (3) compute {inv(j) : 
j = 1, . . . ,N, j ^ i}. Since sorting takes order of magnitude iVlogiV, steps (1) 
and (2) are performed in the required computational time. It remains to show that 
(3) can be computed in order of magnitude NlogN. We show the algorithm in the 
Supplementary Material. 



3 Simulations 

In the simulations , we c ompare the performance of our test and the dCov test of 



Szekelv and Rizz o (2009). We chose the latter test for two reasons. First, it is the only 



consistent test of simple form that is availab le. Second, the superiorit y of the dCov 
test ov er classical tests in Puri and Sen! ( 197lh has been demonstrated in Szekelv et al 



d2007l) . Moreover, our aim is to investigate the performance of our test for non- 



monoton e relationships, and th ese classical tests, or related tests for higher dimensions 
found in Taskinen et al. (2005), are ineffective for testing non-monotone types of de- 
pendence (iSzekely et all 120071) . 

In all simulations, the dCov test was applied by calling th e function dcov.test im- 



plemented in the R package energy ( Szekelv and Rizzoi 20091) with 10000 permutation 



samples. The Euclidean distance was used as a distance metric. 

We conside r first the six simulated examples of unusual bivariate distributions in 
Newton (2009). These examples mimic those at the wikipedia . orgj page on Pear- 



son correlation, see Supplementary Material for details. The example of 4 indepen- 
dent clouds is an example of a null distribution. Table [3] shows the power comparison 
between dCov and the new test for N = 50 sample points and a significance level 
a = 0.05. Large differences are observed. The most pronounced difference is ob- 
served for the circle relation, where the power of the new test is 0.993 yet dCov has 
no power to detect the relation. For the diamond relation, the new test has a power 
of 0.662 yet the power of dCov is 0.037. The tests based on Pearson and Spearman 
cor relations had a powe r of at most 0. 16 in all examples. 



Szekelv et al.1 d2007l) considered multivariate examples and compared them to like- 
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Table 4: The power (SE x 100) of a test at level 0.05 per sample size from a 
5 dimensional joint distribution, where X ~ -W(0, isxs) and Y — log(X 2 ) or 
Y = (Yi, . . . , Y$) has coordinates Yj = Xj ■ ej, where ej ~ iV(0, 1) independent 
of Xj. The results are based on 1000 simulations. 





Y = \og(X 2 ) 


Yi = 




Sample size 


dCov new test 


dCov 


new test 


N=20 


0.172(1.2) 0.299(1.4) 


0.335 (1.5) 


0.554(1.6) 


N=30 


0.290(1.4) 0.595(1.6) 


0.384(1.5) 


0.792(1.3) 


N=40 


0.436(1.6) 0.819(1.2) 


0.417 (1.6) 


0.920 (0.9) 


N=50 


0.629(1.5) 0.945(0.7) 


0.443 (1.6) 


0.968 (0.6) 



Table 5: The power (SE x 100) of a test at level 0.05 per sample size from a 5 di- 
mensional joint distribution, where Yj — P\Xj + P2X 2 + ej,j = 1, ...,mi and 
Yj = 6j,j = mi + 1, . . . , 5, with ej ~ N(0, a 2 ) independent of Xj ~ N(0, 1). The 
results are based on 1000 simulations. 











dCov 


new test 


mi 


Pi 


ft 


a 2 


N=20 


N=30 


N=20 


N=30 











1 


0.040 (0.6) 


0.047 (0.7) 


0.051 (0.7) 


0.047 (0.7) 


2 


1 


4 


9 


0.501 (1.6) 


0.637(1.5) 


0.669(1.5) 


0.984 (0.4) 


2 


3 


2.5 


9 


0.841 (1.2) 


0.963 (0.6) 


0.706 (0.5) 


0.998(0.1) 



lihood ratio type of tests. In the following two examples from ISzekelv et al.l d2007l) . 
none of the likelihood ratio type of tests considered performed well. Using our no- 
tation, the distribution of X = (X\, . . . , X5) is standard multivariate normal with 5 
dimensions. First, let Y be equal to \og(X 2 ). Columns 2 and 3 of Table|4] shows the 
power of a test at level 0.05 for dCov as well as for the new test. The new test has a 
power of 0.82 for N = 40 sample points, whereas the power of dCov is 0.436. Second, 
let Y = (Yi , . . . , Y5) have coordinates Yj = Xj ■ Cj, where tj are independent standard 
normal variables and independent of Xj. Columns 4 and 5 of Table [4] show the power 
of a test at level 0.05 for dCov as well as for the new test. The new test has a power of 
0.968 for N = 50 sample points, whereas the power of dCov is 0.443. 

A more sophisticated scenario, which includes both a monotone and non-monotone 
component, is the following: Yj = /3iXj+j32Xj+ej,j = 1, . . . , mi and Yj = 6j,j — 
mi + 1, . . . , 5, with tj - 7Y(0, a 2 ) and Xj - 7Y(0, 1) for all j. Table |7| shows the 
power of a test at level 0.05 for dCov as well as for the new test for various values 
of /?i,/32,cr 2 , mi £ {0,2}. Further results in 100 dimensions are included in the 
Supplementary Material. When (3 2 is large relative to fti, the power of the new test is 
better than that of dCov. 

Finally, we consider an example where X and Y are both of dimension 1000, from 
a mixture distribution with 10 equally likely components. In the zth component, i £ 
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Table 6: The power (SE x 100) of a test at level 0.05 per sample size from the joint 
distribution of 10 mixture components for random vectors of dimension 1000, each 
component is centered around a different mean and is either multivariate Cauchy or 
multivariate t with 3 degrees of freedom. The results are based on 200 simulations. 



Sample size 


t(3df) 
dCov new test 


Cauchy 
dCov newtest 


N=50 
N=100 
N=200 
N=300 


0.100(2.1) 
0.190 (2.8) 
0.345 (3.4) 
0.620 (3.2) 


0.570 (3.5) 
0.980(1.0) 
1.000 (0.0) 
1.000 (0.0) 


0.040(1.4) 
0.050(1.5) 
0.075 (1.9) 
0.020(1.0) 


0.130 (2.4) 
0.185 (2.7) 
0.390 (3.5) 
0.580 (3.5) 



{1, . . . , 10}, (X, Y) are the random variables {/^(i) + e, Hy(i) + v}> where fi x (i) 
and (iy(i) are sampled (once) from the 1000 dimensional multivariate standard normal 
distribution, and (e, rj) are sampled independently from the multivariate Cauchy or 
multivariate t with 3 degrees of freedom, with the identity correlation matrix. The 
dependency of X and Y is through the fixed pairs {/j, x (i), n v (i)}, i — 1, . . . , 10 such 
that the data consists of 10 clouds around these pairs. See Supplementary Material 
for details. Table [6] shows the power of a test at level 0.05 for dCov as well as for 
the new test. The new test has a power of one for N = 200 sample points in the 
multivariate t distribution, whereas the power of dCov is 0.23. For the multivariate 
cauchy distribution, dCov has no power even at N = 300, as expected s i nce d Cov is 



consistent only for distributions with finite first moments (ISzekelv et al.L 120071) . The 



power of the new test is 0.58 for N — 300 sample points. Moreover, for the multivariate 
normal distribution, the power for both tests is one for N = 50 sample points. 



4 An example 

In a homogeneous population, the dependence between single nucleotide polymor- 
physms (SNPs) on the same chr omosome is weaker the f arther the SNPs are from 
each other due to recombination dLander and Schork , 11994 . A question of interest is 



whether SNPs across chromosomes are independent. To answer this question we ex 
amined the DNA of a sample of 97 unrelated individuals of Han Chinese in Beijing 



China , available from the HapMap project (IThe Internationa l HanMa p Consortium , 
120031) . This sample is regarded to be of relatively homogeneous ancestry, since donors 
were required to have at least three Han Chinese grandparents. For the purpose of this 
example, we limit ourselves to chromosomes 21 and 22 and ask whether the SNPs on 
chromosome 2 1 are independent of the SNPs on chromosome 22. We first preprocessed 
the data by removing subjects with more than 30% missing SNPs on a chromosome, 
SNPs with missing subjects, and SNPs with minor allele frequency below 0.05. After 
preprocessing, 43 subjects remained. For each subject we had a vector of dimension 
31,858 of SNPs from chromosome 21, and a vector of dimension 36,264 of SNPs from 
chromosome 22. The Euclidean distance was used as a distance metric. Our proposed 
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test was highly significant, with a p-value below 1 x 10 . The dCov test was also 
significant, with ap-value of 6 x 10 -4 . 

5 Final remarks 

Pearson's chi-squared test statistic was originally proposed as an approximation to the 
log-likelihood ratio statistic, in our context 

S LR (i, j) = 2 £ £ A kl (i, J) log[A kl (iJ)/{^M^^-}]. 

k=l 1 = 1 

An alternative test statistic for independence may therefore be Tlr = ^2j=i Slr('< 

In the simulation results considered, the permutation test with this test statistic had very 
similar power to the power of the suggested test. 

After discovering that the random vectors are dependent, a natural question to ask 
is which sub-vectors are dependent. This can be done using mult iple comparis ons pro- 



cedures, similar to post-hoc testing in the analysis of variance ( Scheffe , ll959h . More- 
over, the larger the value of S(i, j), the stronger the dependence between the variables 
I{d(Xi,X) < d(xi,Xj)} and I{d(yi,Y) < d(y l ,y j )}. Informally, if S(i,j) is large 
and d(xi,Xj) and yj) are small, this suggests that the random vectors X and Y 
are dependent in balls of size d(xi,Xj) and d(yi,yj) around Xi and yi. We plan to 
explore methods of localizing the dependency in future work. 
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Supplementary material 

Supplementary material includes the proofs of cases 3 and 4 of the theorem, the algo- 
rithm for implementing the test in order of magnitude A 2 log(A), further simulations, 
and an additional one-dimensional real data example. 



Appendix 



We shall prove Theorem 12. 11 for the case where the index sets u, v, s, t are all non- 
empty, since it is straightforward to adapt the proof to the cases where u or s are empty 
sets. 
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From henceforth, for notational convenience we shall repress the conditioning event 
and denote the joint and marginal densities conditional on the discrete coordinate 
values as h{x(v),y(t)}, f{x(v)}, and g{y{t)} in place of h{x(v),y(t) | X(u) — 
x(u),Y(s) = y(s)}J{x(v) | X{u) = x(u)}, and g{y(t) \ Y(s) = y{s)}. More- 
over, we denote p{x(u), y(s)} = Pr{X(u) = x(u),Y(s) = y(s)},p{x(u)} — 
Pr{X{u) = x(u)}, <mdp{y(s)} = Pr{Y(s) = y(s)}. 

If Hq is false, and the point of dependence (xq, yo) satisfies properties (i) and (ii) 
of Theorem |2.1| Without loss of generality, suppose 

p{x Q {u), y (s)}h{x (v),yo(t)} > p{xo(u)}f{x (v)}p{yo(s)}g{y (t)}. (1) 

Let Rd be a positive constant smaller than both the radius of the ball around xq (it) 
that contains only xo(u), and the radius of the ball around yo(s) that contains only 
the point yo(s). Then the set {(x, y) : d(x, xo) < Rd, d(y, yo) < Rd} contains only 
points with discrete coordinates x(u) — xo(u),y(s) = yo(s). Moreover, since the 
joint density conditional on {xq(u), yo(s)} is continuous, there exists a radius R c such 
\hatp{x (u), y (s)}h{x(v), y(t)} > p{x {u)}f{x(v)}p{y (s)}g{y(t)} for all points 
(x, y) in the set {(x, y) : d(x, x ) < R c , d(y, y ) < R c , x(u) = x (u),y(s) = y (s)}. 
Let R = mm{Rd,R c } and A = {{x,y) : d(x,xo) < R,d(y,y ) < R}- Then the 
set A has positive probability, for all points (x, y) £ A the discrete coordinates are 
x(u) = x (u) and y(s) = yo(s), and moreover 

mm[p{x(u),y(s)}h{x(v),y(t)} - p{x(u)}f{x(v)}p{y(s)}g{y(t)}} > 0. 

Denote this minimum by the positive constant c. 

Clearly the following two subsets of A have positive probability as well: 

Ai = {(x,y) : d(x,x ) < R/8,d(y,y ) < R/8} 

and 

A 2 = {(x, y) : 3i?/8 < d(x, x ) < R/2, 3R/8 < d(y, y ) < R/2}. 

Denote the probabilities of Ai and A2 by /1 and /2 respectively. Therefore, we expect 
(Nfi)(Nf2) pairs of sample points i and j such that (xi,yi) £ Ai and (xj, yj) G Ai- 
For these sample points i and j, 

3-R/8 < d(xj,xo) < d(xj,Xi) + d(xi,xo) < d(xj, Xi) + R/8 (2) 

where the second inequality is the triangle inequality, and the first and third inequalities 
follow since (xj, yj) £ A2 and (xi,yi) £ Ai. It follows from (O that 

d( Xi ,Xj) > R/4, d{ Vi , Vj ) > R/A. (3) 

Moreover, if a sample point k is closer to i than to j both in the X vector and in the 
Y vector, then it is within the x and y spheres of radius R: 

Lemma .1 If d(xk,Xi) < d(xi,Xj), thend(xk,xo) < R. Similarly, ifd(yk,yi) < 
d{yi, Vj), then d(y k , y ) < R. 
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Proof: Since the proof follows the same steps for x k and y k , we only show it for the x 
coordinates. The result follows by applying the triangle inequality several times, 



d{x k ,x ) < d{x kl Xi) +d(xi,x ) < d(xj,Xi) +d(x u x ) 

< d(xj,x ) + 2d(x t ,x ) < R/2 + 2R/8 = 6R/8 < R. 

The consequence of Lemma [T|is that for all such samples k, (x k ,y k ) £ A 
Moreover, all points that are within the x and y spheres of radius R/8 are closer to 
i than the point j: 

Lemma .2 If d{x k ,Xa) < R/8, then d(x k ,Xi) < d(Xi,Xj). Similarly, if d(y k ,yo) < 
R/8, then d(y k ,y l ) < d{y l ,y j ). 

Proof: Since the proof follows the same steps for x k and y k , we only show it for the 
x coordinates. Applying the triangle inequality, d(x kl Xi) < d(x k ,xo) + d(xi,Xo) < 
R/8 + R/8 = R/4. The result follows from ©. Therefore, if (x k ,y k ) £ Ax, then k 
is closer to i than to j in both X and Y, 

By the law of large numbers, almost surely 

lim Al '^ j) = p{x (u),y (s)} [ h{x(v),y(t)}dx(v)dy(t) (4) 

lim ^j^l= p {x (u)} [ f{x(v)}dx(v) (5) 
lim^M.p^)}/ g{y(t)}dy(t) (6) 

N-+oo N - 2 J As 

where A 3 = {(x,y) : d(x,x. t ) < d{x ll x J ),d(y,y l ) < d(yi,yj)}, A4 = {x : 
d(x,Xi) < d(xi,Xj)}, and A 5 = {y : d(y,y t ) < d(yi,yj)} . 

Recall that S(i, j) = £* =1 E? =1 {4y (*, j) ~ A k . (i, j) A,(i, j)/(N - 2)} 2 /{A k . (i,j)A 4 (i, j)/(N- 
2)}. It is enough to look at the term with I — 1 and k — 1 in S(i, j), i.e. the term 

q( . {AnjiJ) - AUiJ^jijyjN - 2)Y 
l[hJ> A 1 .(i,j)A. 1 (i,j)/(N-2) 

It follows that S > S±(i,j), and therefore that our test statistic T > E i=1 J^f^i 
By Slutzky's theorem and the continuous mapping theorem, almost surely 

j. giM u 1 {A 11 (»,j)-A.(»,j)A 1 (z,j)/(jV-2)} 2 

^"^^-2 ^"^^-2 Ai.(i,j')Ai(i,j)/(iV-2) 

(Jx 3 bfaM- yoQOMafo). y(*)> - P{xo(u)}f{x(v)}p{yo(s)}g{y(t)}}dx(v)dy(t)) 2 
~ J^\p{xo{u)}f{x(v)}p{y (8)}g{y(t)}]dx(v)dy(t) 

We shall show that this limit can be bound from below by a positive constant that 
depends on (xq, yo) but not on i and j. From Lemma [T] it follows that A3 C A and 
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from Lemma[2]it follows that A\ C A3, and therefore a positive lower bound on the 
numerator of © can be obtained: 

/ \p{x (u),y (s)}h{x(v), y{t)} - p{x Q (u)}f{x(v)}p{y Q (s)}g{y(t)}]dx(v)dy(t) 

> c / dx(v)dy(t) > c / dx(v)dy{t). 

J A 3 J Ax 

Moreover, £ 4a {p{a;o(w)}/{a;(w)}p{yo(s)}5{y(i)}}rfa ; (w)<iy(i) < 1. Therefore, de- 
noting the lower bound by d = {c dx(v)dy(t)} 2 , it follows that S\(i, j) / (N — 2) 
converges almost surely to a constant larger than c' > 0. Therefore, S\(i,j) > 
(N — 2)c'/2 with probability going to 1 as N — > 00. Since, moreover, the num- 
ber of pairs of points i and j such that (xi, j/j) G and (xj,yj) G A2, divided by 
Zi/2^ 2 , converges almost surely to 1, it follows that there exists a constant (5 such that 
lim^o. Pr(T >SN 3 ) = 1. 

Under the null hypothesis, for large enough sample size N, S(i,j) is distributed x 2 
with 1 degree of freedom. Therefore, the null expectation of T is approximately N(N— 
1), and the null variance is bounded above by a term of order N 4 (more precisely, by 
{N(N ~ 1)} 2 2). Since J2^Li J2?=i s (hj) is of order of magnitude of iV 3 , it follows 

that T will be rejected with probability 1 . 
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A Supplementary Material 

A.l Proofs 

The proof of case 1 is omitted, since it is very similar to the more complex case 2. The 
proofs of the countable case 3, and the mixed case where one random vector is discrete 
and the other continuous, are given, respectively, in Sections [A. 1 . 1 | and |A. 1 ,2| below. 

A.l.l Proof of the countable case 3 

Suppose X 6 W and Y £ are both discrete with countable support. Ho is false 
implies that there exists at least one pair of atoms (xo,yo) such that Pr(X = xq, Y — 
yo) > Pr(X = xo)Pr(Y = yo). We expect NPr{X — Xq, Y = yo) points to have 
values (xq, yo). Let i and j be two such points. By the law of large numbers, almost 
surely 

,. A n (i,j) A 1 .(i,j) , \ v A -i( 

Recall that 

2 2 

fe=i i=i 

It is enough to look at the term with I = 1 and k = 1 in S(i,j), i.e. the term 

{A ll {i,j)-A x .{i,j)A. l {i,j)/{N-2)Y 



A 1 .(i,j)A. 1 (i,j)/(N-2) 



It follows that S > Si(i,j), and therefore that our test statistic T > Ef^i 
By Slutzky's theorem, almost surely 

l im = ii m 1 {An(i,j)-A 1 .(i,j)A. 1 (i,j)/(N-2)} 2 



n^ooN-2 A 1 .(i,j)A. 1 (i,j)/(N - 2) 

{Pr{X = x Q ,Y = y ) - Pr(X = x )Pr(Y = y )} 2 
Pr{X = xo)Pr(Y = y ) 

It follows that S\(i, j)/ (N — 2) converges almost surely to a positive constant 
d > 0. Therefore, S\{i,j) > (N - 2)c'/2 with probability going to 1 as N -> 
oo. Since we have order of magnitude of N 2 pairs of points i and j that satisfy the 
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inequality S\(i,j) > (N — 2)c'/2, it follows that there exists a constant 5 such that 
lirriAT^oo Pr(T > SN 3 ) = 1. By the same argument as in the last paragraph of the 
Appendix in the main text, it therefore follows that T will be rejected with probability 
1. 

A.1.2 Proof of mixed case 4 

Suppose X G W is discrete with countable support, and Y G 5ft 9 has a continuous 
density given X, denoted by h(y \ X = x), and a marginal density g{y). H n is false 
implies that there exists at least one pair of points xo, yo such that Pr(X = xo)h(Y = 
yo | X = xo) > Pr(X — x )g(Y = yo). Since h(- \ X = x ) is continuous, there 
exists a radius R such that Pr(X = xo)h(Y = y | X = xq) > Pr(X = xo)g(Y = y) 
for (x, y) e A = {(x, y) : x — xq, d(y, yo) < R}. The set A has positive probability, 
and moreover 

mm{Pr(X = x )h(Y = y | X = x ) - Pr(X = x )g(Y = y)} > 0. 
A 

Denote this minimum by the positive constant c. 

Clearly the following two subsets of A have positive probability as well: 

M = {(x, y) : x = xo, d(y, y ) < R/8} 

and 

A 2 = {(x, y) : x = x a ,3R/8 < d{y, y ) < R/2}. 

Denote the probabilities of Ai and Ai by /i and fa respectively. Therefore, we expect 
(Nfi)(Nf 2 ) pairs of sample points i and j such that (xi,yi) e A\ and (xj, yj) G A 2 - 

For these sample points i and j, d(yi,yj) > R/A. From Lemma 1 in the Appendix, 
if d(yi t ,y i ) < d(yi,yj), then d(yk,yo) < R- From Lemma 2 in the Appendix, if 
d(y k ,y ) < R/8, then d(y k ,yi) < d(yi,yj). Therefore, if (x k ,y k ) G Ai, then k is 
closer to i than to j in Y. 

By the law of large numbers, almost surely 

lim = Pr{X = xo) I h(y | X = x )dy 

N^ao N - 2 J M 

where A 3 = {(x,y) : x = x a ,d(y 7 y t ) < d(yi,yj)}, , and A A = {y 

d(y t ,Vj)} ■ 
Recall that 

2 2 

s &j) = EE { A kAhj)- A k .(i,j)A 4 (i,j)/(N - 2)} 2 /{A k .^j)A,( l ,j)/(N-2)}. 



(8) 

(9) 
(10) 

: d(y,yi) < 
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It is enough to look at the term with I = 1 and k = 1 in S(i, j), i.e. the term 



q( . n _ {Au(i,j) - A 1 .(i,j)A. 1 (i,j)/(N - 2)} 2 
1[,J) A 1 .(i,j)A. 1 (i,j)/(N-2) 

It follows that S > Si(i,j), and therefore that our test statistic T > J2iLi J2f^ ?')• 
By Slutzky's theorem and the continuous mapping theorem, almost surely 

Um giM = Um 1 {^n(i,i)-^i-(i,j)Ai(i,j)/(iV-2)} 2 



at^oo TV -2 w^ooiV-2 Ai.(i,j)Ai(i,j)/(JV-2) 

Pr(X = ^o)[/^{% | X = x )dy - g(y)}dy} 2 

Sju 9 ^ dy 

It follows that S\(i, j)/(N — 2) converges almost surely to a positive constant c' > 
0. Therefore, Si(i, j) > (N — 2)c'/2 with probability going to 1 as N — > oo. Since we 
expect (7V/i)(7V/ 2 ) pairs of sample points i and j that satisfy the inequality Si(i,j) > 
(N — 2)c'/2, it follows that there exists a constant S such that limjv^oo Pr(T > 
SN 3 ) — 1. By the same argument as in the last paragraph in the Appendix of the main 
text, it therefore follows that T will be rejected with probability 1 . 



A.2 Computational Complexity 

In this Section we give a C implementation of the computation of {inv(j) : j — 
l,...,N,j ^ i} in order of magnitude iVlog N. The algorithm uses an adaptation 
of the classic merge sort algorithm. The basic idea is to split the array in half and 
sort each half while counting the number of inversions for each element in each half. 
In the merging stage of both halves, if an element in the right side is smaller than an 
element in the left side, it means that the number of inversions for the smaller element 
should be updated by adding to it the number of elements on the left side which are 
larger than it. The complexity of this algorithm T(N) respects the recursion T(N) = 
2T(N/2) + 0(N) and therefore it is T(N) = 0(N log N). The C code is given below. 



int Inversions ( int *permutation, int *source, int 
*inversion_count, int dim) { 

if (dim==l) 

return ; 

else { 

Inversions (permutation, source, inversion_count , dim/2); 

Inversions ( Spermutat ion [ dim/ 2], Ssource [dim/ 2 ] , inver sion_count , dim/ 2 ) ; 
Merge (permutation, source, inversion_count , dim) ; 

} 

return 0; 



int Merge (int *permutation, int *source, int *inversion_count, int 
dim) { 

int i; 
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int left [MAX_DIM] , right [MAX_DIM] , lef t_source [MAX_DIM] , right_source [MAX_DIM] ; 
int left_index=0, right_index=0 ; 
for (i=0; i<dim/2; i++) { 

left [ i ] =permutat ion [ i ] ; 

lef t_source [i]=source[i] ; 

} 

for (i=0; i<dim/2; i++) { 

right [ i ] =permutat ion [ i+dim/ 2 ] ; 
right_source [ i ] = source [i+dim/ 2 ] ; 

} 

f or ( i=0 ; i<dim; i++) { 

if ( ( lef t_index<dim/2 ) && (right_index<dim/2 ) ) { 
if (left [left_index] <right [right_index] ) { 
permutation [i] =left [left_index] ; 
source [ i ] =lef t_source [ lef t_index] ; 
lef t_index++; 

} 

else { 

permutation [i] =right [right_index] ; 
source [ i ] =right_source [ right_index] ; 

printf ( "adding %d invs to %d\n", dim/2-lef t_index, source [i]); 
inversion_count [source [i] ] += (dim/2-lef t_index) ; 
right_index++; 

} 

} 

else { 

if ( lef t_index<dim/2 ) { 

permutation [i] =left [left_index] ; 
source [ i ] =lef t_source [ lef t_index] ; 
lef t_index++; 

} 

if (right_index<dim/2 ) { 

permutation [i] =right [right_index] ; 
source [ i ] =right_source [ right_index] ; 
right_index++ ; 



} 

} 

return 0; 



A.3 Simulations 

In the simulations presented in the main text, we first considered the six simulated 
examples of unusual bivariate distributions. Figure Q] shows the scatter plots for a 
sample of size N — 50 from each of these distributions. 

In the simulations presented in the main text, the last example was of a mixture 
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Figure 1: Six simulated examples of unusual bivariate distributions; a sample of size 
N=50 from each distribution. 
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Figure 2: A scatter plot of the first coordinate in the mixture distribution of 10 compo- 
nents, where each coordinate has a t distribution with 3df around a different center. Left 
panel, noise 10 times smaller than generated; Right panel, noise used in the simulation. 

distribution in 1000 dimensions. Figure |2] shows the first coordinate of X and Y in 
a setting where the standard deviation of the noise is 10 times smaller than actually 
generated (Left), as well as with the actual noise used in the simulation (Right panel), 
for the multivariate t distribution with 3df. 

A more sophisticated scenario in 100 dimensions, which includes both a monotone 
and non-monotone component, is the following: Yj = /3±Xj + fiiX?- + G I\ 
and Yj = e^j G {1, . . . , 10Q}\.Zi, with e j ~ N(0,E x ) and X ~ N(0,E x ). The 
covariance matrix T, x is block diagonal, with symmetric correlation of 0.9 in the first 
block, 0.8 in the second block, etc. The last block has correlation, and the diagonal 
entries of H x are 1 . In the null setting where I\ = 0, the empirical power for the 
new test, based on 1000 simulations, was 0.046, 0.043, and 0.051 for N = 30,40, 
and 50, respectively. Table [7] shows the power of a test at level 0.05 for dCov as well 
as for the new test for /3\ — 1,02 = 4, c 2 = 9, and two configurations of I±. The 
power of the new test is better than that of dCov in the settings considered, in which 
the non-monotone part of the relationship has a stronger effect than the monotone part 
of the relationship. Moreover, the power of both tests is larger in the first setting, of 
strong dependence between the coordinates of X, than in the second setting, where the 
dependence across coordinates is weaker, since in the first setting the highly associ- 
ated components of X cause dependence between each coordinate of Y with several 
coordinates of X. 



A.4 A univariate example 

Szekely and Rizzol ( 20091) examined the Saviotti aircraft data of Saviotti ( 19961) . that 



records six characteristics of aircraft designs during the twentieth century. They con- 
sider two variables, wing span (m) and speed (km/h) for the 2 30 designs of the third 
(of thr ee) periods. This example and the data (aircraft) are from lBowman and Azzalini 
( 1997b . They showed that the dCov test of independence of log(Speed) and log(Span) 
in period 3 is significant (p-value < 0.00001), while the Pearson correlation test is not 
significant (p-value = 0.8001). Our proposed test is also highly significant (p-value 
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Table 7: The power of a test at level 0.05 per sample size from a 100 dimensional joint 
distribution, where Yj = X, + 4Xf + ej,j G h and Yj = e if j G {1, . . . , 100}\Ji, 
with 6j ~ iV(0, 9). The results are based on 1000 simulations. 



h 


Sample size 


dCov 


new test 


{1, 


..,10,51, 


..,55} 


N = 30 


0.382 


0.629 








iV = 40 


0.456 


0.782 








N = 50 


0.541 


0.879 


{41,.. 


.,50,91,.. 


. , 100} 


AT = 30 


0.246 


0.243 








A^ = 40 


0.271 


0.340 








AT = 50 


0.293 


0.474 








AT = 60 


0.359 


0.553 








N = 7Q 


0.369 


0.626 








N = 80 


0.433 


0.673 



< 0.00001). Moreover, if we take a random sample of 30 observations and apply the 
dCov test and the proposed test to this small random sample, then we typically get 
smaller p-values using our proposed test than using the dCov test. Specifically, repeat- 
ing the testing of a random sample of 30 observations 100 times, the p-value of our 
proposed test was below 0.05 for 58/100 simulation runs, whereas for dCov only for 
18/100 simulation runs. Figure [3] shows the scatter plot of wing span vs. speed on the 
log scale for a sample of 30 points. The relationship appears fan-like. For this partic- 
ular sample, the p-value from the dCov test and our proposed test were 0.21 and 0.03, 
respectively. Figure|4]shows the distribution of the lOOp-values for each of the tests. 
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Figure 3: The scatter of wing span vs. speed on the log scale for a sample of 30 points. 
The p-value from the dCov test and our proposed test were 0.21 and 0.03, respectively. 



Figure 4: The boxplots of the 100 p- values for dCov and the proposed test based on a 
random sample of 30 points from the Aircraft data. 
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